/*==============================================================================
NUTS 2006 to WVS to NUTS 2010 crosswalk.do

Outline
1) convert proprietary regions format in World Values Survey to NUTS 

Note 1: 
The WVS is coded in a proprietary format, corresponding to NUTS 2006 codes. 

Problem: changes in NUTS regions between 2006 & 2010. 
Documentation of changes between NUTS 2006 and NUTS 2010 is available here: 
http://epp.eurostat.ec.europa.eu/portal/page/portal/nuts_nomenclature/documents/2006-2010.xls

Different types of changes are dealt with as follows: 
*1) Region Merges- 
NUTS 2006 regions that merge by 2010 are assigned the same NUTS 2010 code.
 These will later be collapsed to get average of WVS responses across regions.

*2) Boundary shifts/ New region-
New NUTS 2010 regions made up mostly of the land area of the old region 
NUTS 2006 region are assigned the WVS values of the NUTS 2006 region. 

*3) Split- 
 Multiple new 2010 regions based on as single NUTS 2006 region are assigned 
 the same responses for the NUTS 2006 region. 
  
Notes: 
NUTS 1 regions with no merge & are not the result of code changes:
 FI2 - LAND Islands: Island
  
 FR9 - DPARTEMENTS D'OUTRE-MER: Islands and land in South America

NUTS 2 regions with no merge & are not the result of code changes:
 ES64 - Ciudad Autnoma de Melilla:  Located in Africa
 
 FI20 - land: Island
 
 FR83 - Corse: Island
 FR91 - Guadeloupe: Island
 FR92 - Martinique: Island
 FR93 - Guyane: Located in South America
 FR94 - Runion: Island
==============================================================================*/

* ------------------------------------------------------------------------------
* I. Bring in .csv files that contain the WVS proprietary code alongside 
* corresponding Nuts 2006 code (code2006) 
* ------------------------------------------------------------------------------

clear all
set more off

local countries_for_analysis "AT BE CH DE DK ES FI FR IT NL SE UK"

forval n=1/3 {
	insheet using "$insheet_files/wvs_nuts`n'.csv"
	sort country code
	save "$dta_files/wvs_nuts`n'.dta", replace
	clear
}


* ------------------------------------------------------------------------------
* II. Bring in crosswalk between NUTS 2006 and NUTS 2010 regions
* ------------------------------------------------------------------------------

insheet using "$insheet_files/nuts_2006_2010_crosswalk.csv"

gen region_name=country if nuts_level==0
	replace region_name=nutslevel1 if nuts_level==1
	replace region_name=nutslevel2 if nuts_level==2
	replace region_name=nutslevel3 if nuts_level==3
	
drop country nutslevel1-nutslevel3

gen country = substr(code2010, 1,2)

quietly gen insample=0
foreach country in `countries_for_analysis' {
	quietly replace insample= 1 if country=="`country'"	
}
keep if insample==1|country=="CH"

tab change

*drop regions listed as "Extra-regio"
gen extra_regio = substr(code2010,3,1) 
	drop if extra_regio=="Z"
drop extra_regio

tempfile crosswalk
save `crosswalk'

* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Code changes to NUTS level 1 regions between 2006 and 2010
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
keep if nuts_level==1

sort country code2006
merge m:1 country code2006 using "$dta_files/wvs_nuts1.dta", gen(in_wvs)

replace code2010=code2006 if country=="CH"
tab code2010 if in_wvs!=3 

drop if code2010=="ITH"
replace code2010 = "ITH" if code2006=="ITD"
replace in_wvs=3 if  code2010 == "ITH"

drop if code2010=="ITI"
replace code2010 = "ITI" if code2006=="ITE"
replace in_wvs=3 if code2010 == "ITI"

display "NUTS 1 codes not in WVS"
tab code2010 if in_wvs!=3 

replace nuts_level=1
keep code2006 code2010 wvs_code nuts_level

tempfile append_nuts1
save `append_nuts1'

* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Code changes to NUTS level 2 regions between 2006 and 2010
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

use `crosswalk', clear

keep if nuts_level==2

sort country code2006
merge m:1 country code2006 using "$dta_files/wvs_nuts2.dta", gen(in_wvs)

replace code2010=code2006 if country=="CH"
tab code2010 if in_wvs!=3 

*1) Merges/New regions
drop if code2010=="DE40 (part)"
drop if code2010=="DE40"
replace code2010 = "DE40" if code2006=="DE41"
replace code2010 = "DE40" if code2006=="DE42"
replace in_wvs=3 if  code2006 == "DE41"
replace in_wvs=3 if  code2006 == "DE42"

drop if code2010=="FI1D"
replace code2010 = "FI1D" if code2006=="FI13"
replace code2010 = "FI1D" if code2006=="FI1A"
replace in_wvs=3 if  code2006 == "FI13"
replace in_wvs=3 if  code2006 == "FI1A"

*2) Boundary shifts
drop if code2010=="DED4"
replace code2010="DED4" if code2006=="DED1"
replace in_wvs=3 if code2006=="DED1"

drop if code2010=="DED5"
replace code2010="DED5" if code2006=="DED3"
replace in_wvs=3 if code2006=="DED3"

drop if code2010=="ITH5"
replace code2010="ITH5" if code2006=="ITD5"
replace in_wvs=3 if code2006=="ITD5"

drop if code2010=="ITI3"
replace code2010="ITI3" if code2006=="ITE3"
replace in_wvs=3 if code2006=="ITE3"

drop if code2010=="UKD6"
replace code2010="UKD6" if code2006=="UKD2"
replace in_wvs=3 if code2006=="UKD2"

drop if code2010=="UKD7"
replace code2010="UKD7" if code2006=="UKD5"
replace in_wvs=3 if code2006=="UKD5"

*3) Splits
drop if code2006=="FI18"
replace code2006="FI18" if code2010=="FI1B"
replace code2006="FI18" if code2010=="FI1C"
replace wvs_code= 2460108 if code2010=="FI1B"
replace wvs_code= 2460108 if code2010=="FI1C"
replace in_wvs=3 if code2010=="FI1B"
replace in_wvs=3 if code2010=="FI1C"

display "NUTS 2 codes not in WVS"
tab code2010 if in_wvs!=3 

replace nuts_level=2
keep code2006 code2010 wvs_code nuts_level

keep code2006 code2010 wvs_code nuts_level

* ==============================================================================
* Append all together
* ==============================================================================

append using `append_nuts1'

/*
Splits. 
This problem affects only the following 14 nuts: FI1B, FI1C, UKD62, UKD63, 
UKE44, UKE45, UKF24, UKF25, UKG36, UKG37, UKG38, UKG39, UKH24, UKH25. 

In this step, tie all WVS response to only one of the two regions that split  
(i.e. FI1B). 
After the collapse, create duplicates so that one of the regions that split 
(i.e. FI1B) is tied to one duplicate, and the other (i.e. FI1C) is tied to 
another. 
*/
replace code2010="FI1B" if code2010=="FI1C"

save "$dta_files/nuts_2006_2010_wvs_crosswalk.dta", replace

rename code2010 nuts
keep nuts nuts_level
sort nuts
duplicates drop nuts, force
drop if nuts==""

save "$dta_files/nuts_level.dta", replace
